Seeing in Color: Jet Superstructure 
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A new class of observables is introduced which aims to characterize the superstructure of an 
event, that is, features, such as color flow, which are not determined by the jet four-momenta alone. 
Traditionally, an event is described as having jets which are independent objects; each jet has some 
energy, size, and possible substructure such as subjets or heavy flavor content. This description 
discards information connecting the jets to each other, which can be used to determine if the jets 
came from decay of a color-singlet object, or if they were initiated by quarks or gluons. An example 
superstructure variable, pull, is presented as a simple handle on color flow. It can be used on an 
event-by-event basis as a tool for distinguishing previously irreducible backgrounds at the Tevatron 
and the LHC. 
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Hadron colliders, such as the LHC at CERN, are 
fabulous at producing quarks and gluons. At energies 
well above the confinement scale of QCD, these colored 
objects are produced in abundance, only hadronizing 
into color-neutral objects when they are suflBciently far 
apart. The observed final-state hadrons coUimate into 
jets which, at a first approximation, are in one-to-one cor- 
respondence with hard-partons from the short-distance 
interaction. In fact, this description is so useful that 
it is usually possible to treat jets as if they are quarks 
or gluons. Conversely, in a first-pass phenomenological 
study, it is possible simply to simulate the production 
of quarks and gluons, assuming they can be accurately 
reconstructed experimentally from observed jets. 

In certain situations, the jet four-momenta alone do 
not adequately characterize the underlying hard process. 
For example, when an unstable particle with large trans- 
verse momentum decays hadronically, the final state may 
contain a number of nearly coUinear jets. These jets may 
then be merged by the jet-finder. Or, due to contami- 
nation from the underlying event, the energy of the re- 
constructed jet may not optimally represent the energy 
of the hard parton, thereby obscuring the short-distance 
event topology. Over the last few years, a number of im- 
proved jet algorithms and filtering techniques have been 
developed to improve the reconstruction of hard scatter- 
ing kinematics [1-4], with experimentally endorsed suc- 
cesses including reviving a Higgs to bb discovery channel 
at the LHC [1] (implemented by ATLAS [5]) and making 
top-tagging as reliable as 6-tagging [2] (implemented by 
CMS [6]). Nevertheless, there is still a horde of informa- 
tion in the events which these substructure techniques 
ignore. Jets have color, and are color- connected to each 
other, providing the event with an observable and char- 
acterizable superstructure. 

The term color- connected comes from a graphical pic- 
ture of the way SU{i) group indices are contracted in 
QCD amplitudes. To be concrete, consider the produc- 
tion of a Higgs boson at the LHC with the Higgs decaying 
to bottom quarks. The hard process is qq ^ H ^ bb. 
Since the Higgs is a color singlet, the color factor in the 
leading order matrix element for this production has the 




FIG. 1: Possible color connections for signal {pp H bb) 
and for background {pp g ^ bb). 



form Tr[r^r^]Tr[T'^T^], where are generators of 
the fundamental representation of S'?7(3), A and B index 
the initial state quarks and C and D index the final-state 
6's. Since Tr[r<^r^] cx J'^^, the color of C must be the 
same as D, which can be represented graphically as a 
line connecting quark C to quark D. This color string 
or dipole is shown in Figure 1. An example background 
process is gg — > g — >■ bb. Here, there are two possibili- 
ties for the color connections: Tr[T^T^]Tr[T^T^] and 
Tr[T^r^]Tr[T^TC], both of which connect one incoming 
quark to one outgoing quark, as shown also in Figure 1. 
The color string picture treats gluons as bifundamentals, 
which is correct in the limit of a large the number of col- 
ors, Nc — > oo. Subleading corrections are included in 
simulations through color-reconnections, which amount 
to a 1/7V2 - 10% effect. 

Since color flow is physical, it may be possible to ex- 
tract the color connections of an event. Such informa- 
tion would be complimentary to the information in the 
jets' four-momenta and therefore may help temper oth- 
erwise irreducible backgrounds. For example, one ap- 
plication would be in cascade decays from new physics 
models. In supersymmetry, one often has a large number 
of jets, originating from on-shell decays like q ^ qx or 
from color-singlet gauge boson or gaugino decays. One of 
the main difficulties in extracting the underlying physics 
from these decays is the combinatorics: which jets come 
from which decay? Mapping the superstructure color 
connections of the events could then greatly enhance our 
ability to decipher the short-distance physics. 
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FIG. 2: Accumulated pr after showering a particular par- 
tonic phase space point 3 million times. Left has the b and 
b color-connected to each other (signal) and right has the b 
and b color-connected to the beams (background). Contours 
represent factors of 2 increase in radiation. 
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FIG. 3: Event-by-event density plot of the pull vector of the 6 
jet in polar coordinates. The signal (connected to b jet) is on 
the left, the background (connected to the left-going, y = — oo 
beam) is on the right. 10^ events are shown. 



In order to extract the color connections, they must 
persist into the distribution of the observable hadrons. 
The basic intuition for how the color flow might show 
up follows from approximations used in parton show- 
ers [7, 8]. In these simulations, the color dipoles are al- 
lowed to radiate through Markovian evolution from the 
large energy scales associated with the hard interaction 
to the lower energy scale associated with confinement. 
These emissions transpire in the rest frame of the dipole. 
When boosting back to the lab frame, the radiation ap- 
pears dominantly within an angular region spanned by 
the dipole, as indicated by the arrows in Figure 1. Alter- 
natively, an angular ordering can be enforced on the radi- 
ation (as in HERWIG [9] ) . The parton shower treatment of 
radiation attempts to include a number of features which 
are physical but hard to calculate analytically, such as 
overall momentum and probability conservation or co- 
herence phenomena associated with soft radiation. 

It is more important that these effects exist in data 
than that they are included in the simulation. In fact, 
color coherence effects have already been seen by vari- 
ous experiments. In e+e^ collisions, for example, evi- 
dence for color connections between final-state quark and 
gluon jets was observed in three jet events by JADE 
at DESY [10]. Later, at LEP, the L3 and DELPHI 
experiments found evidence for color coherence among 
the hadronic decay products of color-singlet objects in 
W^W~ events [11, 12]. Also, inpp collisions at the Teva- 
tron, color connections of a jet to beam remnants have 
been observed by DO in VF-|-jet events [13]. All of these 
studies used analysis techniques which were very depen- 
dent on the particular event topology. What we will now 
show is that it is possible to come up with a very general 
discriminant which can help determine the color flow of 
practically any event. Such a tool has the potential for 
wide applicability in new physics searches at the LHC. 

For an example, we will use Higgs production in asso- 
ciation with a Z. The Z allows the Higgs to have some 
so that its hh decay products are not back-to-back 



in azimuthal angle, 0. Our benchmark calculator will 
be MADGRAPH [14] for the matrix elements interfaced to 
PYTHIA 8 [15] for the parton shower, hadronization and 
underlying event, with other simulations used for valida- 
tion. 

To begin, we isolate the effect of the color connec- 
tions by fixing the parton momentum. We compare 
events with Zhb in the final state (with Z leptons) in 
which the quarks are color-connected to each other (sig- 
nal) versus color-connected to the beam (background). 
In Figure 2, we show the distribution of radiation for 
a typical case, where (y, </>) = (—0.5,-1) for one h and 
(y, 0) = (0.5, 1) for the other, with = 200 GeV for 
each h. where y is the rapidity. For this figure, we have 
showered and hadronized the same parton-levcl configu- 
ration over and over again, accumulating the pt of the 
final-state hadrons in 0.1 x 0.1 bins in y-(^ space. The 
color connections are unmistakable. 

The superstructure feature of the jets in Figure 2 that 
we want to isolate is that the radiation in each signal jet 
tends to shower in the direction of the other jet, while in 
the background it showers mostly toward the beam. In 
other words, the radiation on each end of a color dipole 
is being pulled towards the other end of the dipole. This 
should therefore show up in a. dipole-type moment con- 
structed from the radiation in or around the individual 
jets. For dijet events, like those shown in Figure 2, one 
could imagine constructing a global event shape from 
which the moment could be extracted. However, a lo- 
cal observable, constructed only out of particles within 
the jet, has a number of immediate advantages. For one, 
it will be a more general-purpose tool, applying to events 
with any number of jets. It should also be easier to cali- 
brate on data, since jets are generally better understood 
experimentally than global event topologies. Therefore, 
as a first attempt at a useful superstructure variable, we 
construct an observable out of only the particles within 
the jets themselves. 

In constructing a jet moment, there are a number of 
ways to weight the momentum, such as by energy or py, 
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FIG. 4: Distribution of the pull angle (for the b jet) with 
^Vbb ~ 1 s-nd ^4>bb ~ 2, for signal and background, showered 
10^ times with different Monte Carlos. 



FIG. 5: Pull angles in the fe or 6 jet in HZ — ^ Zbb signal 
events and their Z + bb backgrounds. For each event, Adt = 
is defined to point toward the other b jet. 3 x 10^ events are 
shown. 



and to define the center the jet. These are all basically 
the same, but we have found that the most effective com- 
bination is a pT-weighted vector, which we call pull, de- 
fined by 




Here, n = {Ayi,A(f>i) = c,; - J, where J = {yj^ct).]) is 
the location of the jet and q is the position of a cell or 
particle with transverse momentum pip. Note that we 
use rapidity yj for the jet instead of pseudorapidity (jyj); 
because the jet is massive this makes r,; boost invariant 
and a better discriminant (rapidity and pseudorapidity 
are equivalent for the effectively massless cells/particles, 
Ci). The centroid (Eq. (1) without the \ri\ factor) is usu- 
ally almost identical to J, the location of the jet four- 
vector in the iJ-scheme (the sum of four-momenta of the 
jet constituents). 

An important feature of the pull vector t is that it 
is infrared safe. If a very soft particle is added to the 
jet, it has negligible pT, and therefore a negligible effect 
on t. Moreover, since pull is linear in pT, if a particle 
splits into two eoUinear particles at the same r, the pull 
vector is also unchanged. This property guarantees that 
pull should be fairly insensitive to fine details of the im- 
plementation, such as the spatial granularity or energy 
resolution of the calorimeters. 

The event-by-event distribution of the pull for the left 
h jet from Figure 2 is shown in Figure 3 in polar co- 
ordinates, t = (1^1 cos6't, 1^1 sin^t), where 9t — Q points 
towards the right-going beam, 9t = ±7r points towards 
the left-going beam, and 9t ~ 0.7 toward the other b jet. 
This figure shows density plots of the t distributions on 
an event-by-event basis for the signal and background 
cases for this particular fixed parton-level phase space 
point. For this figure, we use as input the four-momenta 
of all long-lived observable particles. If instead, we use 
the hadronic energy in 0.1 x 0.1 cells treated as mass- 
less four- vectors, the distribution of pull vectors is nearly 
identical. 



We can see that most of the discriminating informa- 
tion is in the pull angle, 9t, rather than the magni- 
tude This leads to Figure 4, which shows the dis- 
tribution of the pull angle for the signal and the back- 
ground in this particular kinematic configuration. This 
figure also shows that the pull vector is not particularly 
sensitive to the Monte Carlo program used to generate 
the sample; the pull angle distributions for herwigH — h 
2.4.2 [9], PYTHIA 8.130 [15], and pythia 6.420 with the 
PT-ordered shower [7] are all quite similar. 

The previous three figures all have the parton momen- 
tum fixed. Similar distributions result from other phase 
space points. We fixed the parton momentum to show 
the usefulness of pull in situations which would be indis- 
tinguishable using the jet four-momenta alone. This ex- 
ercise controls for correlations between pull and matrix- 
element-level kinematic discriminants. Also, note that 
there is another possible color-flow for the background 
events, where the left-going jet is color-connected to the 
right-going beam. Then, the most-likely pull angle would 
be more similar to the signal. Fortunately, this only oc- 
curs about 10% of the time for the dominant background. 

The next step is to see if pull is useful given the 
full distribution of signal and background events at the 
LHC. The pull angle for the full ZH Zbb signal and 
Zbb backgrounds still presents a strong discriminant, as 
can be seen in Figure 5. Here, we have performed a 
fuh simulation with madgraph 4.4.26 [14] and pythia 
8.130 [15], including underlying event and hadronization. 
We choose a parton-level cut of pT > 15 GeV for the 
b quarks, find the jets with the anti-fcx algorithm with 
R = 0.7, require the reconstructed mass to be within a 
20 GeV window around the Higgs mass (120 GeV), and 
construct the pull angle on the radiation within each jet. 

Next, let us consider some other possibilities. It is nat- 
ural to look at higher moments, such as those contained 
in the covariance tensor 

^ _ V- Prlnl ( Ayf Ay, \ 
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The eigenvalues a > 6 of this tensor are similar to the 
semimajor and semiminor axes of an elliptical jet. The 
overall size of these g = Va^+b^ provides a decent char- 
acterization of whether the jet is initiated by a quark or 
gluon. Gluon jets, since they cap two color dipoles, gener- 
ally have more radiation and lead to jets with larger val- 
ues oi g. However, g is strongly correlated with the mass 
of a jet and the mass-to-pr ratio. Since mass and pr are 
contained in the jet four-momentum, this measure of size 
is not likely to provide a new handle for irreducible back- 
grounds. Other combinations of second- moment eigen- 
values, such as the eccentricity e ^^{0^ — b'^)/a or ori- 
entation of the ellipse, seem much less useful. While one 
might expect gluon jets to be fairly elliptical, due to their 
being pulled in two directions, in fact quarks turn out to 
be equally elliptical; we have not found a significant dif- 
ference in the eccentricity of quark and gluon jets. Going 
to third or higher moments is straightforward, but serves 
no immediate purpose. 

We conclude that the pull angle is the most useful 
moment-type observable for determining the color su- 
perstructure of an event. Besides moments, one could at- 
tempt to use more global observables, such as the amount 
of radiation around or between jets. As we have men- 
tioned, such an approach is in principle promising, but 
the analysis would have to be very process-dependent. A 
nice feature of pull is its universality. Although we have 
used as a canonical example Higgs production in associ- 
ation with a Z boson, the pull angle can be used to char- 
acterize any process with jets, such as cascade decays in 
supersymmetry or resonance decays in composite models. 
In fact, for practically any new physics scenario involving 
jets, finding the color connections would be very helpful, 
and the pull angle provides a simple tool to extract this 
information. 

In order to apply superstructure variables to new 
physics searches, it will be critical to first validate them 
on standard model data. One useful class of events is 
tt production. For semileptonic ti decays, we can get 
an arbitrarily clean sample by tightening the &-tags, top 
mass window, and leptonic W reconstruction. This will 
give us a pure sample of hadronic, boosted W bosons. 
The two light quark jets from the W decay should be 
color-connected, and the pull angle of each quark can be 
measured on data. The same sample also provides h jets 
connected to the beam. We have tested this idea in sim- 
ulations of ti events, and have found that the pull angle 
distribution in the hadronic W decay products is in fact 
similar to that of the the Higgs decay in Figure 5. 

Finally, let us mention a few words about the choice 
of jet algorithm. Using the program fast jet v2.4 [16] 
for jet finding, we found that the anti-fcT[17] algo- 
rithm, which takes radiation from more circular regions, 
gives better results than kr [18], SIScone [19], or Cam- 
bridge/Aachen [20]. It is also possible to find the jets 
with one algorithm and size, say i? = 0.7 and then use 



a larger size, say R = 1.2, to calculate the moment. We 
have not found an obvious improvement from doing this, 
but such possibilities should be explored. For example, if 
the pull angle were to be used by an experimental collab- 
oration in Higgs search, a few percent improvement could 
probably be gained by optimizing the algorithm in coor- 
dination with the detailed experimental parameters. It 
would also be worth investigating whether jet filtering [1] 
or trimming [4] , could help make pull or other superstruc- 
ture variables even more discriminating. Although there 
is still a lot of room for improvement, it is clear that color 
flows and jet superstructure can be useful observables at 
hadron colliders, and are worth understanding better. 
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